Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite nixpkgs's create-amis script using Terraform #254

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

AmineChikhaoui
Copy link
Member

@AmineChikhaoui AmineChikhaoui commented Jun 2, 2023

This rewrites create-amis.sh using terraform.
The create-amis.sh was pretty slow and switching to Terraform makes things more declarative and easier to read.
There are few things I don't like in the Terraform code, since we're copying the AMI to a set of regions (which require a different aws provider per region), the module system of terraform doesn't really work for this. The easy solution was to just use jq and make to generate the copy ami resources in JSON spec instead of HCL.
Even using Terraform, this still takes 15minutes which isn't that fast either but at least ami copying is done in parallel.

A note about usage of Terraform workspaces, since we don't want to trigger terraform replacements i.e delete and recreate of resources as we want to keep AMIs being used (and perhaps use a lifecycle to purge them periodically), I went with a workspace (statefile) per {release}.{arch}.{buildId}.

Things I'm planning to explore now that the building blocks are here:

@AmineChikhaoui AmineChikhaoui marked this pull request as ready for review June 5, 2023 02:40
@AmineChikhaoui
Copy link
Member Author

@zimbatm @grahamc if you can do a quick review or if there are any suggestions.

AWS OIDC setup is pointing to my fork/branch but can switch it before merging to nixos/nixos-org-configurations. I can also terraform the IAM role config for visibility.

Here is an example of the workflow runs: https://github.com/AmineChikhaoui/nixos-org-configurations/actions/runs/5172652543

@cole-h
Copy link
Member

cole-h commented Jun 6, 2023

Something that I just discovered is the aws_dlm_lifecycle_policy resource, which has an example for cross-region copy: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dlm_lifecycle_policy#example-cross-region-snapshot-copy-usage. Supposedly, setting policy_type to IMAGE_MANAGEMENT and resource_types to INSTANCE might work? Might be worth looking into, if it means we might be able to get rid of the jq and scripted processing.

Though I don't know if the cross-region copying means you'd need a provider for each region (as is currently necessary). I also don't know if the fact that it's for managing a lifecycle makes it unsuitable for this, though (i.e. if it requires you to specify something that would "clean up" images / etc and not just use it for cross-region copying).

EDIT: But the fact the documentation says "for the copied snapshot", "cross-Region snapshot copies", etc makes me think that this might not work exactly as I originally thought 🥲

EDIT2: There is also this aws_ami_copy resource that might be useful. I think it still requires a separate provider for each region we want to copy to, but...

@AmineChikhaoui
Copy link
Member Author

EDIT2: There is also this aws_ami_copy resource that might be useful. I think it still requires a separate provider for each region we want to copy to, but...

That's the one I'm using in regions.jq right ?

@cole-h
Copy link
Member

cole-h commented Jun 7, 2023

......yes. 🤦

If only there was some easier way to do this :')

@AmineChikhaoui
Copy link
Member Author

@cole-h probably CDK but not sure if it's worth the effort compared to a few lines of jq file 😄

I'll look into DLM, I was planning to explore it for managing the deprecation and removal of AMIs. Didn't see copy AMI handling specifically from a quick scroll over docs but can look a bit more.

RaitoBezarius added a commit to RaitoBezarius/nixpkgs that referenced this pull request Nov 22, 2023
The release management process involves uploading functional AMIs
at each release for NixOS.

As this script mentions, the 'entire thing' should be replaced with a new setup
involving Terraform.

Such a new setup can be found in NixOS/infra#254.

In the meantime, this script should not be used anymore until a better solution for AMI uploading is
found.

This means effectively that AMI uploading is not part of the release management process for further
releases.

Feel free to push forward the PR mentioned to get back AMI upload.
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/ami-for-nixos-23-11/36860/4

@delroth
Copy link
Contributor

delroth commented Feb 16, 2024

Trying to understand the status of this PR - what is the relation between this work and the (currently operational, I think) https://github.com/NixOS/amis maintained by @arianvp? Are these complementary / parallel implementations / does one deprecate the other / ...?

(Bear with me, I have very little context, but I'd like to clean up this repo and make sure we have a clear set of pending items to look at :p)

@AmineChikhaoui
Copy link
Member Author

@delroth I didn't read through https://github.com/NixOS/amis yet but my understanding it will deprecate this process and this PR won't be needed anymore in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants